Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pass to assign tiles to logical objectFifos #915

Merged
merged 2 commits into from
Nov 21, 2024

Conversation

jtuyls
Copy link
Collaborator

@jtuyls jtuyls commented Nov 21, 2024

Adds a pass that assigns tiles to logical objectFifos based on tile usage by trying to to distribute logical objectFifos as evenly as possible across available tiles. The new pass is based on the tile assignment logic that was originally part of DistributeCoresAndObjectFifos and refactors the latter pass to use the extracted functions to keep the logic the same for now.

Note that I am not completely removing the tile assignment functions from DistributeCoresAndObjectFifos for two reasons:

  • To keep the tests largely the same for now.
  • Because the output of DistributeCoresAndObjectFifos before tile assignment is not stable, i.e. calling cse/canonicalize in between DistributeCoresAndObjectFifos and AssignTiles leads to different functional (incorrect) results.

Until the output of DistributeCoresAndObjectFifos before tile assignment is stable, the tile assignment utility functions should be called within the pass itself and not be separated out.

@@ -605,6 +605,10 @@ void addAMDAIEObjectFifoLoweringPasses(
passManager.addPass(createCSEPass());
passManager.addPass(createCanonicalizerPass());

passManager.addPass(createAMDAIEAssignTilesPass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pass supposed to be added to the pipeline here? Thought you wanted to still do everything in SplitLogicalObjFifosForConnectionReusePass

Copy link
Collaborator Author

@jtuyls jtuyls Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is now run after SplitLogicalObjFifosForConnectionReusePass, because everything done in that pass (and the one from @yzhang93 we will enable soon) might cause us to want to assign new tile locations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@newling
Copy link
Contributor

newling commented Nov 21, 2024

If you want to semi-ensure that 2 passes are always run back-to-back, you can use the aievec approach of adding to the pipeline with a helper function:

void buildLowerVectorToAIEVec(mlir::OpPassManager &pm);

(minus) doesn't technically ensure a user doesn't misuse the pass
(plus) integrates all code into the main pipeline just once
(plus) easier for us to see what transformation happened in the different non-trivial functions with --mlir-print-ir-before-all

@jtuyls
Copy link
Collaborator Author

jtuyls commented Nov 21, 2024

If you want to semi-ensure that 2 passes are always run back-to-back, you can use the aievec approach of adding to the pipeline with a helper function:

void buildLowerVectorToAIEVec(mlir::OpPassManager &pm);

(minus) doesn't technically ensure a user doesn't misuse the pass (plus) integrates all code into the main pipeline just once (plus) easier for us to see what transformation happened in the different non-trivial functions with --mlir-print-ir-before-all

Yeah, but the disadvantage is a real issue for me. A pass not being stable on itself is just an invite for bugs and misuse. The workaround hides the real underlying problem that needs to be tackled, which is ensuring that the output without tile assignment is stable on itself. I have some ideas for this, but I don't think it's a priority right now because the pass has held up pretty well as it stands and if needed, you can easily debug it with --debug-only.

Copy link
Contributor

@yzhang93 yzhang93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Some minor comments.

@@ -20,7 +20,7 @@
// CHECK: aie.use_lock
// Check a bit of the aiex.runtime_sequence:
// CHECK: aiex.runtime_sequence @matmul_i32()
// CHECK: } {npu_instructions = dense_resource<npu_instructions> : tensor<174xui32>, runtime_sequence_name = "matmul_i32"}
// CHECK: } {npu_instructions = dense_resource<npu_instructions> : tensor<208xui32>, runtime_sequence_name = "matmul_i32"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of this tensor? Why does the value change?

Copy link
Collaborator Author

@jtuyls jtuyls Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the control code instructions tensor. It was a bit weird that this changed, but I had a look at it and it seems like earlier channels were shared when they shouldn't have been and now they are not anymore, leading to more shim channels being used and therefore, more control code.

Comment on lines 450 to 451
"device-specific information is required to determine when loops "
"can be subsumed into DMA operations, and must be attached to a "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the error message should be changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, updated it,

@@ -67,6 +67,11 @@ def AMDAIEAssignPacketIds :
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEAssignPacketIdsPass()";
}

def AMDAIEAssignTiles : Pass<"iree-amdaie-assign-tiles", ""> {
let summary = "Assign physical tile locations to logical objFifos.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: objFifos -> objectFifos.


// -----

// Test duplicate global logical objFifos.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe add note to indicate global logical objFifos are in L3?

return success();
}

/// Utility to duplicate global objFifos for each strided copy-like operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: global objFifos -> global logical objectFifos in L3

Comment on lines 109 to 119
rewriter.setInsertionPoint(copyOp);
auto newTarget = rewriter.create<AMDAIE::LogicalObjectFifoFromMemrefOp>(
rewriter.getUnknownLoc(),
cast<LogicalObjectFifoType>(target.getOutput().getType()),
target.getMemref());
rewriter.replaceUsesWithIf(
target.getOutput(), newTarget.getOutput(), [&](OpOperand &use) {
return use.getOwner() == copyOp.getOperation();
});
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are almost duplicate codes as the above. Maybe extract these as a separate function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 347 to 353
AMDAIE::TileOp assignedTileOp =
*std::min_element(tiles.begin(), tiles.end(), tileLocAndUsageCmp);

// Increase usage of the chosen tile.
int64_t col = getConstantIndexOrAssert(assignedTileOp.getCol());
int64_t row = getConstantIndexOrAssert(assignedTileOp.getRow());
tileLocToUsage[TileLoc(col, row)] += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments to explain these?

Copy link
Contributor

@newling newling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, no blocking comments from me

tiles.erase(std::unique(tiles.begin(), tiles.end()), tiles.end());
for (AMDAIE::TileOp tile : tiles) {
std::optional<int64_t> column = getConstantIntValue(tile.getCol());
if (!column) return tile.emitOpError() << "found non-constant column";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error without resulting signal of pass failure. But this isn't new in this PR, so not requesting a change on this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I changed this to notifyMatchFailure.

parentOp->emitOpError() << "failed duplicating global object fifos";
return signalPassFailure();
}
if (failed(verify(parentOp, true))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 1: The additional prints and verifications: I know they're copied from another pass, but do you think maybe they're not needed here? Because they're not in other passes. They suggest to me that this pass is still not to be considered stable, is that accurate?

Question 2: I assume duplicateGlobalObjFifos is something you explicitly do not want to use in DistributeCoresAndObjectFifos` ? If that's not the case, you could have moved this copied code into another util function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 1: The additional prints and verifications: I know they're copied from another pass, but do you think maybe they're not needed here? Because they're not in other passes. They suggest to me that this pass is still not to be considered stable, is that accurate?

Yeah, good point, I think this part is actually stable and they can be removed.

Question 2: I assume duplicateGlobalObjFifos is something you explicitly do not want to use in DistributeCoresAndObjectFifos` ? If that's not the case, you could have moved this copied code into another util function.

The reason for this was mainly to keep the behaviour of DistributeCoresAndObjectFifos as close as possible to what it was to avoid more test changes. Maybe this could be tried and changed in a follow-up.

@@ -67,6 +67,11 @@ def AMDAIEAssignPacketIds :
let constructor = "mlir::iree_compiler::AMDAIE::createAMDAIEAssignPacketIdsPass()";
}

def AMDAIEAssignTiles : Pass<"iree-amdaie-assign-tiles", ""> {
let summary = "Assign physical tile locations to logical objFifos.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let summary = "Assign physical tile locations to logical objFifos.";
let summary = "Assign physical tile locations to logical objFifos. Existing assignments will be ignored/replaced.";

maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated it.

@yzhang93 yzhang93 merged commit 1faea2f into nod-ai:main Nov 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants